Skip to content

PCA integration method implemented + tested#265

Open
gregothebyteknight wants to merge 7 commits intonf-core:devfrom
gregothebyteknight:dev
Open

PCA integration method implemented + tested#265
gregothebyteknight wants to merge 7 commits intonf-core:devfrom
gregothebyteknight:dev

Conversation

@gregothebyteknight
Copy link

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/scdownstream branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

kim-fehl and others added 3 commits March 11, 2026 19:03
- add pca to integration methods (schema/docs/cellxgene list)
wire SCANPY_PCA into INTEGRATE and publish outputs
- use X_emb for PCA in integrated AnnData for downstream consistency
Copy link
Collaborator

@nictru nictru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! Can you just create a dedicated test in the INTEGRATE subworkflow?

Comment on lines +1 to +9
channels:
- conda-forge
- bioconda
dependencies:
- python=3.14
- pip
- pip:
- scArches==0.6.1
- anndata==0.9.2 No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This environment uses an outdated version of anndata - is that necessary to get expimap working?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am trying to resolve an anndata problem. The first module test exposed an error with anndata.
Traceback (most recent call last):
File ".command.sh", line 10, in
import scarches as sca
File "/opt/conda/lib/python3.14/site-packages/scarches/init.py", line 1, in
from . import dataset, metrics, trainers, models, zenodo, plotting, utils, classifiers
File "/opt/conda/lib/python3.14/site-packages/scarches/models/init.py", line 1, in
from .trvae.trvae import trVAE
File "/opt/conda/lib/python3.14/site-packages/scarches/models/trvae/trvae.py", line 11, in
from ..base._base import CVAELatentsModelMixin
File "/opt/conda/lib/python3.14/site-packages/scarches/models/base/_base.py", line 10, in
from anndata import AnnData, read
ImportError: cannot import name 'read' from 'anndata' (/opt/conda/lib/python3.14/site-packages/anndata/init.py)
I read that this issue might be due to the inability of scarches=0.6.1 to use the most current version of anndata.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're talking about this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit annoying in this context as we would need to install from the master branch, but with seqera containers we can only install published versions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're talking about this

Exactly, for some reason for scarches=0.6.1 (which should be the most recent one) I receive anndata error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the older anndata version will cause some other compatibility issues

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think actually monkey-patching could be the cleanest solution

Comment on lines -7 to +8
- conda-forge::python=3.12.11
- conda-forge::pyyaml=6.0.2
- conda-forge::scanpy=1.11.2
- conda-forge::pyyaml=6.0.3
- conda-forge::scanpy=1.11.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is even relevant here, but might be interesting for you

If you remove python from the explicit (versioned) dependencies, then the python version might be inconsistent for users that use conda environments for running the pipeline. This is not really a problem, but if the python version is included in the version capture at the end of the script, then the tests start failing.
Thus, either pin the python version or remove python from the version capture

# Initialization of the model with the reference network
intr_cvae = sca.models.EXPIMAP(
adata=adata_processing,
condition_key="${batch_col}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually if you provide the batch_col as the condition_key, then you would force the models to learn which gene programs explain the differences between the batches, which is not desirable

The pipeline also has a condition_col that should be used here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants